Search CORE

149 research outputs found

Teaching Data Science. Constructing Pillars in a Fluid Field

Author: Helmer Sven
Publication venue
Publication date: 31/08/2022
Field of study

A Study of Four Index Structures for Set-Valued Attributes of Low Cardinality

Author: Helmer Sven
Moerkotte Guido
Publication venue
Publication date: 01/01/1999
Field of study

We review and study the performance of four different index structures for indexing set-valued attributes designed to speed up set equality, subset and superset queries. All index structures are based on traditional techniques, namely signatures and inverted files. More specifically, we consider sequential signature files, signature trees, extendible signature hashing, and a B-tree based implementation of inverted lists. The latter is refined by a compression scheme in order to keep space requirements within acceptable bounds. The performance study is based on real implementations subjected to a benchmark accounting for different set sizes, domain sizes, and data distributions (uniform and skewed)

MAnnheim DOCument Server

Evaluation of Main Memory : Join Algorithms for Joins with Set Comparison Predicates

Author: Helmer Sven
Moerkotte Guido
Publication venue
Publication date: 01/01/1996
Field of study

Current data models like the NF2 model and object-oriented models support set-valued attributes. Hence, it becomes possible to have join predicates based on set comparison. This paper introduces and evaluates several main memory algorithms to evaluate efficiently this kind of join. More specifically, we concentrate on the set equality and the subset predicates

MAnnheim DOCument Server

Compiling Away Set Containment and Intersection Joins

Author: Helmer Sven
Moerkotte Guido
Publication venue
Publication date: 01/01/2002
Field of study

We investigate the effect of query rewriting on joins involving set-valued attributes in object-relational database management systems. We show that by unnesting set-valued attributes (that are stored in an internal nested representation) prior to the actual set containment or intersection join we can improve the performance of query evaluation by an order of magnitude. By giving example query evaluation plans we show the increased possibilities for the query optimizer

MAnnheim DOCument Server

Inferring offline hierarchical ties from online social networks

Author: Helmer Sven
Jaber Mohammad
Papapetrou Panagiotis
Wood Peter T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Social networks can represent many different types of relationships between actors, some explicit and some implicit. For example, email communications between users may be represented explicitly in a network, while managerial relationships may not. In this paper we focus on analyzing explicit interactions among actors in order to detect hierarchical social relationships that may be implicit. We start by employing three well-known ranking-based methods, PageRank, Degree Centrality, and Rooted-PageRank (RPR) to infer such implicit relationships from interactions between actors. Then we propose two novel approaches which take into account the time-dimension of interactions in the process of detecting hierarchical ties. We experiment on two datasets, the Enron email dataset to infer manager-subordinate relationships from email exchanges, and a scientific publication co-authorship dataset to detect PhD advisor-advisee relationships from paper co-authorships. Our experiments show that time-based methods perform considerably better than ranking-based methods. In the Enron dataset, they detect 48% of manager-subordinate ties versus 32% found by Rooted-PageRank. Similarly, in co-author dataset, they detect 62% of advisor-advisee ties compared to only 39% by Rooted-PageRank

Crossref

Birkbeck Institutional Research Online

Indexing a Fuzzy Database Using the Technique of Superimposed Coding - Cost Models and Measurements

Author: Boss Birgit
Helmer Sven
Publication venue
Publication date: 01/01/1996
Field of study

Recently, new applications have emerged that require database management systems with uncertainty capabilities. Many of the existing approaches to modelling uncertainty in database management systems are based on the theory of fuzzy sets. High performance is a necessary precondition for the acceptance of such systems by end users. However, performance issues have been quite neglected in research on fuzzy database management systems so far. In this article they are addressed explicitly. We propose new index structures for fuzzy database management systems based on the well known technique of superimposed coding together with detailed cost models. The correctness of the cost models as well as the efficiency of the index structures proposed is validated by a number of measurements on experimental fuzzy databases

MAnnheim DOCument Server

Nested Queries and Quantifiers in an Ordered Context

Author: Helmer Sven
May Norman
Moerkotte Guido
Publication venue
Publication date: 01/01/2003
Field of study

We present algebraic equivalences that allow to unnest nested algebraic expressions for order-preserving algebraic operators. We illustrate how these equivalences can be applied successfully to unnest nested queries given in the XQuery language. Measurements illustrate the performance gains possible by our approach

CiteSeerX

Crossref

MAnnheim DOCument Server

Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships

Author: Helmer Sven
Moerkotte Guido
Westmann Till
Publication venue
Publication date: 01/01/1997
Field of study

Time of creation is one of the predominant (often implicit) clustering strategies found not only in Data Warehouse systems: line items are created together with their corresponding order, objects are created together with their subparts and so on. The newly created data is then appended to the existing data. We present a new join algorithm, called Diag-Join, which exploits time-of-creation clustering. The performance evaluation reveals its superiority over standard join algorithms like nested-loop join and GRACE hash join. We also present an analytical cost model for Diag-Join

CiteSeerX

MAnnheim DOCument Server